The Security of Latent Dirichlet Allocation
نویسندگان
چکیده
Latent Dirichlet allocation (LDA) is an increasingly popular tool for data analysis in many domains. If LDA output affects decision making (especially when money is involved), there is an incentive for attackers to compromise it. We ask the question: how can an attacker minimally poison the corpus so that LDA produces topics that the attacker wants the LDA user to see? Answering this question is important to characterize such attacks, and to develop defenses in the future. We give a novel bilevel optimization formulation to identify the optimal poisoning attack. We present an efficient solution (up to local optima) using descent method and implicit functions. We demonstrate poisoning attacks on LDA with extensive experiments, and discuss possible defenses.
منابع مشابه
Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملSoftware Selection based on Quantitative Security Risk Assessment
Multiple software products often exist on the same server and therefore vulnerability in one product might compromise the entire system. It is imperative to perform a security risk assessment during the selection of the candidate software products that become part of a larger system. Having a quantitative security risk assessment model provides an objective criterion for such assessment and com...
متن کامل以狄式分佈為基礎之多語聲學模型拆分及合併 (Multilingual Acoustic Model Splitting and Merging by Latent Dirichlet Allocation) [In Chinese]
To avoid the confusion of phonetic acoustic models between different languages is one of the most challenges in multilingual speech recognition. We proposed the method based on Latent Dirichlet Allocation to avoid the confusion of phonetic acoustic models between different languages. We split phonetic acoustic models based on tri-phone. And merging the group that selected by Latent Dirichlet Al...
متن کاملDistributed Latent Dirichlet Allocation via Tensor Factorization
We describe a distributed implementation for Latent Dirichlet Allocation parameter estimation based upon the method of moments.
متن کاملExperiments with Latent Dirichlet Allocation
Latent Dirichlet Allocation is a generative topic model for text. In this report, we implement collapsed Gibbs sampling to learn the topic model. We test our implementation on two data sets: classic400 and Psychological Abstract Review. We also discuss the different evaluation of goodness-of-fit of the models how parameter settings interact with the goodness-of-fit.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015